UNIVERSITY OF PENNSYLVANIA, ESE TECHNICAL REPORT 



1 



Adaptive Algorithms for Coverage Control and 
Space Partitioning in Mobile Robotic Networks 

Jerome Le Ny, Member, IEEE, and George J. Pappas, Fellow, IEEE 



Abstract — This paper considers deployment problems where 
a mobile robotic network must optimize its configuration in a 
distributed way in order to minimize a steady-state cost function 
that depends on the spatial distribution of certain probabilistic 
events of interest. Moreover, it is assumed that the event location 
distribution is a priori unknown, and can only be progressively 
inferred from the observation of the actual event occurrences. 
Three classes of problems are discussed in detail: coverage 
control problems, spatial partitioning problems, and dynamic 
vehicle routing problems. In each case, distributed stochastic 
gradient algorithms optimizing the performance objective are 
presented. The stochastic gradient view simplifies and generalizes 
previously proposed solutions, and is applicable to new complex 
scenarios, such as adaptive coverage involving heterogeneous 
agents. Remarkably, these algorithms often take the form of 
simple distributed rules that could be implemented on resource- 
limited platforms. 

Index Terms — Coverage control problems, partitioning algo- 
rithms, dynamic vehicle routing problems, stochastic gradient 
descent algorithms, adaptive algorithms, potential field based 
motion planning. 



I. Introduction 

The deployment of large-scale mobile robotic networks 
has been an actively investigated topic in recent years ([T]- 
||3). Applications range from intelligence, surveillance and 
reconnaissance missions with unmanned aerial vehicles to 
environmental monitoring, search and rescue missions, and 
transportation and distribution tasks. With the increase in size 
of these networks, relying on human operators to remotely 
pilot each vehicle is becoming impractical. Attention is in- 
creasingly focusing on enabling autonomous operations, so 
that these systems can decide online how to concentrate their 
activities where they are most critical. 

A mobile robotic network should have the capability of 
autonomously deploying itself in a region of interest to reach 
a configuration optimizing a given performance objective (3] 
Chapter 5], Such problems can be distinguished based on the 
deployment objective, and among them the coverage control 
problem introduced by Cortes et al. [4] has proved to be partic- 
ularly important. In this problem, the quality of a given robot 
configuration is measured by a multicenter function from the 
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locational optimization and vector quantization literature (5), 
|6|. A distributed version of the Lloyd quantization algorithm 
fll allows a robotic network to locally optimize the utility 
function in a way that scales gracefully with the size of the 
network HI. The basic version of the coverage control problem 
has inspired many variations, e.g., considering limited commu- 
nication and sensing radii (8), (9), heterogeneous sensors flO) , 
obstacles and non-point robots fTT) , or applications to field 
estimation problems | 12 1. It is also related to certain vehicle 



routing problems, notably the Dynamic Traveling Repairman 



Problem (DTRP) p3|-[ 15 1, as discussed by Frazzoli and Bullo 
in (16) and several subsequent papers flT) , |18|. Another 

(201, 



related problem is the space partitioning problem p9) 
where the robots must autonomously divide the environment 
in order to balance the workload among themselves. 

In essentially all the previously mentioned applications, the 
goal of the robotic network is to respond to events appearing 
in the environment. For example in the DTRP, jobs appear 
over time at random spatial locations and are serviced by 
the mobile robots traveling to these locations. The utility 
function optimized by the network invariably depends on 
the spatial distribution of the events, and the optimization 
algorithms require the knowledge of this distribution |4|, 1 16 1, 
(19), (20) . Hence they are not applicable in the commonly 
encountered situations where the robots do not initially have 
such knowledge but can only observe the event locations over 
time. It is then natural to ask how to gradually improve the 
spatial configuration of the robotic network based only on 
these observations. Indeed, recently some coverage control 
algorithms |12|, |21| and vehicle routing algorithms |18|, 
[22 1 have been developed to work in the absence of a priori 
knowledge of the event location distribution. 

An essential idea of our work is that deployment prob- 
lems with stochastic uncertainty can be discussed from the 
unifying point of view of stochastic gradient algorithms, 
thereby clarifying the convergence proofs and allowing to 
easily derive new algorithms for complex problems. In this 
paper we restrict our attention to three related classes of 
problems: coverage control, spatial partitioning, and dynamic 
vehicle routing problems. For these three applications, we 
derive distributed stochastic gradient algorithms that optimize 
the utility functions in the absence of a priori knowledge 
of the event location distribution. We call these algorithms 
adaptive, in analogy with the engineering literature on adaptive 
systems (23) . Remarkably, the algorithms we describe often 
take the form of simple rules, in fact typically simpler than 
the corresponding non-adaptive algorithms. Hence they are 
easier to implement on small platforms with limited sensing, 
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computational and communication capabilities. 



Specifically, we first discuss in Section III certain stochastic 
gradient algorithms that adaptively optimize coverage con- 
trol objectives. These algorithms generalize to new complex 
multi-agent deployment problems and we justify this claim 
by developing solutions to coverage control problems in- 
volving Markovian event dynamics or heterogeneous robots. 
Additional application examples, including deployment under 
realistic stochastic wireless connectivity constraints, can be 



found in [24], [25 1. In Section IV we describe new adaptive 
distributed algorithms that partition the workspace between 
the robots in order to balance their workload, using only 
the observation of the past event locations. These algorithms 
exploit the link between generalized Voronoi diagrams and 
certain Monge-Kantorovich optimal transportation problems 
[26|-[28|. Finally in Section [V] we present the first fully 
adaptive algorithm for the DTRP. In light traffic conditions, the 
policy reduces to the coverage control algorithm of Section III 
and is simpler than the previous algorithm presented in (22) . In 
heavy traffic conditions, it relies on the partitioning algorithm 



of Section IV This algorithm complements the recent work 
of Pavone et al. fl8) , in which the knowledge of the event 
location distribution is required in the heavy traffic regime. 



II. Preliminaries 



A. Notation 



We denote [n] := {l,...,n}. Throughout the paper all 
random elements are defined on a generic probability space 
(il,J-,P), with the expectation operator corresponding to 
P denoted E. We abbreviate "independent and identically 
distributed" by iid, and "almost surely" by a.s. We denote 
the Euclidean norm on R q by || ■ ||. 

Let (X, d) be a metric space. For a set S C X, we denote 
the indicator function of S by Is, i.e., ls{%) = 1 if x € S and 
ls(x) = otherwise. For xo £ X, the Dirac measure at xo is 
denoted by S Xa and defined by S Xo (S) — 1s(xq) for all Borel 
subsets S of X. We denote the distance from a point x £ X 
to a set S by d(x, S) :— inf yeS d(x 7 y), and we set d(x, 0) = 
+oo. A sequence of points {xk}k>o m X is said to converge 
to a set S C X if d(xk, S) — > as k —> oo. For nonempty 
sets B,C C X, the Hausdorff pseudometric is defined by 
dn(B,C) :— max(sup 2;gB d(x, C), sup xeC d(x, B)). The 
ball of radius r around S C X is B(S, r) :— {x £ 
X\d(x,S) < r}. Also, B({x},r) is just denoted B(x,r). 

A solution of a differential equation x — h(x) or of a 
differential inclusion x £ H(x) |29J is interpreted in the sense 
of Caratheodory, i.e., as an absolutely continuous function x{t) 
satisfying 

x(t) = xo + / y(s) ds, for all t £ K,with y(s) = h(x(s)) 
Jo 

or y(s) £ H(x(s)) for all s. 

Finally, a set I is invariant under a differential inclusion x £ 
H(x) if for all xo £ I, there exist some solution x(t),t £ 
(— oo, oo), with a;(Q) = xq, that lies entirely in /. 



B. Robot Network Model 

We consider a group of n robots evolving in R q , for some 
q > 1. We denote the robot positions at time t £ ]R>o by 
p(i) = \pi(t), ■ ■ ■ ,p n (t)] £ (M. q ) n . For simplicity, we assume 
that the robots follow a simple kinematic model 



Vi e [n],Vt £ M>o, Pi(t) = Ui, \ui(t)\ < Vi 



(1) 



where Vi is a positive constant and itj is a bounded control 
input. However, more complex dynamics could be considered 
since our analysis only involves the positions of the robots at 
certain discrete times, see, e.g., ([17}. In addition, the robots 
are assumed to perform computations and to communicate 
instantaneously. Finally, we define 



D,, 



[p = [Pi 



(2) 



Pi 



Pj for some 1 < i < j < n\. 



Hence D„ is the set of configurations where at least two robots 
occupy the same position. 

C. Geometric Optimization 

For a vector p = [pi, . . . ,p n ] £ (M 9 ) n \ D„, we define the 
Voronoi cell of the point p, by 



Vi(p) = { 



z £ 



\z-pi\ < \\z - 



Pj\\,Vj £ [n]|. 



That is, Vi is the set of points for which robot i is the closest 
robot for the Euclidean distance. The Voronoi cells of the 
points divide M. q into closed convex polyhedra, and {Vi}j S [ n ] 
is called a Voronoi diagram pO) . Two points pi and pj or 
their indices i,j (with i ^ j) are called Voronoi neighbors if 
the boundaries of their Voronoi cells intersect, i.e., if Vi(p) fl 

For a function c : R q x R q —> M, a vector w = 
[w u .. .,w n ] £ R n , andp = [p!,.. . , Pn ] £ (W) n \D n , define 
for all i £ [n] the generalized Voronoi cell of the pair (p,j, Wi) 
with respect to c by 



" t 



'(p,w) = | 



Z £ 



c{z,pi) - Wi < c(z,pj) - uij, (3) 
Vj€[n]}. 



We also write Vf(G,w) := V i °(p ) w) for the set Q = 
{pi, . . . ,p n }- The point p, is called the generator and Wi 
the weight of the cell Vf(p,w), and {V, c }i e [„] a generalized 
Voronoi diagram. Intuitively c(z,p) represents a distance or 
cost between the points z and p, and in practice takes the 
form c(z,p) — f(\\z — p||), with / an increasing function. In 
particular for f(x) = x 2 , the generalized Voronoi diagram 
is called a power diagram [30], (31) , and the generalized 
Voronoi cell a power cell. Like Voronoi cells, power cells are 
polyhedra, although possibly empty pT) . Notice from ([3]l that 
for a given configuration p, the size of a generalized Voronoi 
cell of a pair increases as its weight increases with respect to 
the weights of the other pairs. Similarly to Voronoi neighbors, 
we define generalized Voronoi neighbors and power diagram 
neighbors. 
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D. Min-consensus 

At several occasions, we need to solve the following prob- 
lem in a distributed manner in the robotic network. Robot 
i, for i £ [n], is associated to a certain quantity di £ R, 
which can be +00. Each robot must decide if it belongs to the 
set argmirij e [ n ] di. For simplicity, we assume that each robot 
can communicate with some other robots along bidirectional 
links in such a way that the global communication network is 
connected. We also assume that the robots know the diameter 
of the network, denoted diam. Alternatively, they know the 
number n of robots in the system, in which case we take 
diam = n below. 

In a synchronous network the problem can be solved by 
the FloodMin algorithm |32j section 4.1.2]. Each robot 
maintains a record in a variable d± of the minimum number it 
has seen so far, with d± = di initially. At each round, it sends 
this minimum to all its neighbors. The algorithm terminates 
after diam rounds. The agents that still have di = di at the 
end know that they belong to argmin ig [„] d,. This algorithm 
can also be implemented in an asynchronous network by 
adding round numbers to the transmitted messages [32] section 
15.2]. 

III. Adaptive Coverage Control Algorithms 

A. Coverage Control for Mobile Robotic Networks 

In the standard coverage control problem [4], the goal of 
the robotic network is to reach asymptotically a configuration 
where the agent positions lim^oo pi (t), i £ [n] , minimize 
the following performance measure capturing the quality of 
coverage of certain events: 



min/dlft - z \\) 



(4) 



where / : M>o —> IR>o is an increasing continuously 
differentiable function. The random variable Z represents the 
location of an event of interest occurring in the workspace. To 
interpret Q, the cost of servicing an event at location z with a 
robot at location pi is measured by f(\\pi — z\\), and an event 
must be serviced by the robot closest to the location of this 
event. For example, in monitoring applications, f(\\pi — z||) 
can measure the degradation of the sensing performance with 
the distance to the event Q. In vehicle routing problems, this 
cost might be the time it takes a robot to travel to the event 
location, i.e., f{\\pi — z\\) — \\pi — z\\/vi, assuming enough 
time between successive events, see Section [V] 

For simplicity, we assume in this section that the probability 
distribution ¥ z := P o Z^ 1 of Z has support contained in 
a compact convex set Q with nonempty interior. We also 
generally make the following assumption. 

Assumption 1. Hyperplanes in R q have ¥ z -measure zero. 

Note that Assumption [TJ implies that points also have 
measure zero, and in particular the support of ¥ z is infinite. 
The following result, whose proof can be found in Appendix 
[A] provides an expression for the derivatives of £ n , useful 
for optimization purposes. Throughout the paper d£ n /dpi for 
Pi £ R q denotes the q-dimensional vector of partial derivatives 



with respect to the components of pi. We also adopt the 
convention 0/||0|| := 0. 

Proposition 1. Under Assumption [7] £ n is Lipschitz con- 
tinuous on compact sets and continuously differentiable on 
(M. q ) n \ D„, with partial derivatives 

~ = I f(\\ Pi -z\\)^^F z (dz). (5) 

"Pi P JVi{p) \\Pi - z \\ 

Now let us suppose, as in [4| and most of the subsequent 
literature, that the event location distribution ¥ z is known. 
Using ([5]), one can then implement a gradient descent algo- 
rithm to locally minimize the objective Q [4|. Assuming for 
simplicity that the agents are synchronized, and a constant 
sampling period T > 0, we denote the agents positions at 
time kT by p k :— p{kT) — \pi,k, ■ ■ ■ ,Pn,k\- The robots start 
at po = [pi,0) ■ ■ • ,Pn.o] at t = and update their positions 
according to 

d£ 

Pi,k+i = Pi,k - Ik 



d P i 



(6) 



where 7^ is an appropriately chosen sequence of decreasing or 
small constant positive stepsizes. We ignore for the moment 
the issue of non-differentiability on D n as well as the minor 
modifications required to accommodate velocity constraints in 
d§J. The agents can implement |6]) to asymptotically reach 
a configuration that is a critical point of £ n . No guarantee 
to reach a global minimum is offered in general, and indeed 
global minimization of the function (|4]) can be difficult [33) . 
Nevertheless, an interesting property of the gradient descent 
algorithm |6]) for the coverage control problem is that it can 
be implemented in a distributed manner by the robots, by 
exploiting the form of the expression |5]). In particular, each 
agent can update its position at each period according to ([6]) 
by communicating only with its current Voronoi neighbors, 
in order to determine the boundaries of its own Voronoi cell 
Vi(p) and compute the integral d5). Even in a large network, 
a single robot has typically only few Voronoi neighbors 1 3 1 1 , 
which allows for a scalable and distributed implementation of 
the gradient descent algorithm. 

Remark 1. The specific case where f(x) = x 2 is considered 
in |4| in more details. In this case |5]l gives 

fl JL |p=p»=2P,(V r < (pfc))p i , fc - / z¥ z {dz). (7) 

"Pi JVi(p k ) 

Assuming that ¥ z (Vi(pk)) 7^ 0, define the centroid of the 
Voronoi region Vi(pk) as 

1 



Cvi(Pk) V,(.Vi(p k )).n „„, 
Then control law ((6), i.e., 

d£„ 



z¥ z (dz). 



Pi,k+1 = Pi,k - Ik 



dpi 



= Phk - 2 lk ¥ z (V l (p k ))( Pt . k - C Vi(pk) ), (8) 

is essentially the well-known least-squares quantization algo- 
rithm of Lloyd |7) . 

Note that the computation of the updates (|6|l requires ¥ z to 
be perfectly known. The minimization of (H| is then essentially 
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an open-loop optimization problem, and the network can reach 
its desired configuration before any event occurs. However, 
the algorithm does not provide any mechanism to adapt the 
configuration based on the actual observations of where the 
events occur, which is critical in practice as F z is rarely 
available. In the next section, we show how to generally 
address this issue by using stochastic gradient algorithms. 



Subsection III-C applies the method specifically to the adaptive 



coverage control problem. 



B. Stochastic Gradient Algorithms 

Suppose that we wish to minimize a function G defined on 
K m for some m > 0, of the form 

G{x) = E[g{x,Z)} = f g(x,Z(cj))P(duj) 
Jn 



g(x,z)F z (dz), 



(9) 



such as £ n defined in Q for example. The space Z in |9]l is 
the range of the random variable Z. In contrast to the previous 
subsection, we now assume that P 2 is unknown, so that the 
expectation Q cannot be computed directly. Suppose that g is 
differentiable with respect to x, for P 2 -almost all z, and denote 
its gradient \7 x g(x, z) :— d ■ Finally, assume that we can 
observe random variables Z k ,k > 1, iid with distribution P 2 . 
Consider then the stochastic recursive algorithm 



Xk+i =xk — ik^ x g(xk, z k +i), 

which can be rewritten in the form 

x k+ i = x k + j k {h(x k ) + Dfc+i), 



(10) 



(11) 



with h{x) := —E[V x g(x,Zi)\x] and D k +i = 
-V x g(x k , Zk+i) + E[V x g(xk,Z k+1 )\x k ]. Define for 
k > the filtration T k '■— cr(xo,Di,l < i < k), i.e., an 
increasing family Tk C Fi for k < I of sub-c-algebras of T '. 
Then {D k } k >i is a martingale difference sequence (MDS) 
with respect to {J 7 k } k >o, as explained in the following 
definition. 

Definition 1. Let {T k }k>o be a filtration. A sequence of 
random variables {D k } k >i is called a martingale difference 
sequence with respect to {Tk}k>o if D k is measurable with 
respect to F k , E\^D k \f\ < oo, and we have E[D k \F k -{\ = 0, 
for all k > 1. 

Intuitively, the MDS {D k }k>i plays the role of a zero- 
mean noise. By the ODE method |34) , we can expect that 
asymptotically, under the condition 



Ik > 0, 



fe=0 



+0O, 



fc=0 



(12) 



on the stepsizes, which holds for j k = 1/(1 + k) for example, 
the sequence {x k } k >o in (Hi almost surely approaches the 
trajectories of the ODE 



Now assume that it is valid to exchange expectation and 
derivation, as follows 



-VG{x) = -VE[g(x,Z 1 )\x] 
= -E[V x g[x,Z x )\x] 



h{x) 



(14) 



Identity ( f]~4| > can often be proved using the dominated con- 
vergence theorem, see, e.g., [35] Theorem 5.1]. Then the 
ODE (13 1 describes a gradient flow and so in fact under 



mild assumptions the trajectories and therefore the sequence 
{x k } k >o approach the critical points of G. Moreover, we can 
often expect convergence to the set of local minima of G 
almost surely [36, chapter 4]. In conclusion, the algorithm 
(111 allows us to reach such a minimum even though P z is 
unknown, as long as we have access to realizations of the 
random variables Z k - 

We now capture the intuition above more formally, includ- 
ing the situation where the function G is not everywhere differ- 
entiable, as in Proposition [T] Consider a stochastic algorithm 

x k+ i = x k + j k (h k + D k+ f), Vfe > 0, (15) 



where the stepsizes j k satisfy (12i, {D k } k >i is an MDS with 
respect to the filtration F k := \xu ht,Di,l < k}, k > 0, and 
h k is specified in the following theorems. 

Theorem 1. Assume that G is continuously differentiable on 
M. m \ S, with S a set of Lebesgue measure zero. Introduce the 
Filipov set-valued map [29] 



H{x) 



n 



<5>0 ' 



(U* 6 b(x,«)\s{-VG(x)} 



x <£ S, 
x £ S, 



(16) 

where co denotes the closed convex hull. Consider the recur- 



rence (15\ with h k £ H(x k ), for all k > 0. Assume that for 
some positive constants Ki , K 2 we have 



sup \\h\\ < Jfi(l + ll^ll), Va; e 
E[\\D k+1 \\ 2 \F k ]<K(l + \\x k \~ 



a.s. ,Vfc > 0, 



and that sup fc>0 ||^fc|| < oo, o.s. Then the sequence {xk}k>o 
converges almost surely to a connected subset of {x € K m \ 
S|VG(a;) = 0} U S, invariant for the differential inclusion 
x € H(x). 

Theorem 2. Assume that G is convex and admits a minimum 
on R m . Consider the recurrence ( |75| ) with h k a subgradient 
of G at Xk, for all k > 0. Assume that there exists a positive 
constant K such that ElWhk+Dic+il^lTk] < K, for all k > 0. 
Then the sequence {xk}k>o converges almost surely to some 
point minimizing G. 

The proofs of these theorems are standard and not repeated 
here, see (37), (36| chapter 5], (38) Proposition 8.2.6. p. 480] 
and the proof of Theorem [3] in Appendix A-C Note that in 
many applications, the stepsizes j k are chosen to converge to a 



x = h(x). 



(13) 



small positive constant instead of satisfying ( 12 1, which allows 
tracking of the equilibria of the gradient flow if the problem 
parameters (e.g., P z ) change with time. In this case, one 
typically obtains convergence to a neighborhood of a critical 
point [23 1. The selection of proper stepsizes is an important 
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practical issue that is not emphasized in this paper but is 
discussed at length in references on stochastic approximation 
algorithms (35), (37). 



C. Adaptive Coverage Control 

We now consider the following modification of the coverage 
control problem. The events occur randomly in the workspace, 
with event k > 1 occuring at time tk > and location Zk £ Q. 
We let to := denote the initial time. Assume in this subsec- 
tion that the successive locations of the events Zk,k > 1, are 
iid with probability distribution ¥ z on Q. The distribution P z 
is now unknown, and as a result the deterministic gradient 
descent algorithm |6]l cannot be implemented. We work under 
Assumption [T] so that the gradient expression |5]l holds. 

We denote the agent positions at time ijjT, i.e., right 
before the occurrence of the k th event, by Pk-i = 
\Pl k-i> ■ ■ ■ >Pn k-i] <= (IR 9 )", for k > 1. These positions are 
called reference positions and are updated according to 

Pi,k+i = Pi,k+Ui,k, \ui,k\ < "i.fc, Vfc € Z> ,Vi € [n], (17) 

where Ui t k £ K' is a control input for the interval [tk,tt+\). 
For example, if the robot dynamics follow the model ([T]l and 
if servicing the targets requires no additional travel, we can 
take Vi t k = Vi(tk+i — tk) for all i £ [n]. We assume that there 
exists a constant v > such that v^k > v for all i £ [n] and 
k > 0, so that the robots can update their reference positions 
by a non-vanishing positive distance at each period. 

When the k th event occurs at time tk and position Zk £ Q, 
we assume that at least the robot closest to that event lo- 
cation can observe it. This robot, say robot i, services the 
target starting from its location p< fc_i, and then moves to a 
new reference position p^fe. The following reference position 
updates implement the stochastic gradient algorithm 
minimize the coverage objective (4). First, for a vector 
and a scalar b > 0, define the truncation [sat(u)]b by 



TO} to 

i 



■ u £ 



[sat(u)] 6 = 
Then consider the update rule 

PiM+1 



if || M || < b, 
if Hull > b. 



(18) 



n Q 



Pi,k + sat 



j k f(\\ P , k -z k+1 \\) ] ^^ 

if i £ argmin ie [„] \\pj >k - Z k+ i 



^Pi^k otherwise, 



Moreover let us define (dg/dp)(p,z) arbitrarily for z on 
the Voronoi cell boundaries and at the points pj. These sets 
have P z -measure zero under Assumption [T] and hence do not 
contribute to the integral 

dg 



E 



dp, 



/'(Hp 

Vt(p) 

d£ n , . 

"H— (P) 

dpi 



4) 



Pi 



WPi - z\ 



for p <£ D„. 



In other words, Proposition [T] precisely says that the identity 
(14| is valid for E n in (R 9 )™\D„. Note also that almost surely 



the update rule ( 18 i never results in two robots landing on the 
same position as long as q > 2 and the updated reference 
position before projection remains in Q, because this would 
require Zk+i to fall on the line passing through these two 
robot reference positions. This can be achieved for q = 1 or 
for a reference position projected on the boundary of Q as 
well, by a small random perturbation of the sequence 7^ [36, 
Chapter 2]. Hence we can assume in the following that almost 
surely pk ^ D„ for all k > 1. Moreover, the projection ITq 
and the saturation nonlinearity do not change the convergence 
properties of the algorithm, see Appendix [A] Therefore, (p"8j ) 
is essentially the stochastic gradient descent update rule UQt- 
It is interesting to compare the implementation complexity 



of algorithm ( 18 1 with that of the corresponding deterministic 
gradient descent update based on d6l. The deterministic, 
model-based algorithm requires that each agent maintains 
communication with its Voronoi neighbors and knows their 
position in order to determine the boundaries of its Voronoi 
cell and compute the integral ([5]). Even in the quadratic case 
(jHJ, this scheme can be difficult to implement. In contrast, 
no Voronoi cell computation or integration and no detailed 



knowledge of the position of the neighbors is required by ( 18 1, 
which only needs a distributed mechanism to find which robot 
is the closest to the target when it appears. This can be done 
in a distributed way via the FloodMin algorithm described 
in Paragraph |II-D| with the agents initializing their value to 
di = \\Pi,k ^ Zk+i\\ if me Y detect the event, and to dj = +00 
if they are too far away to detect it. Clearly there are other 
ways to implement the rule ( 18 I. For example, we could let 



all the robots travel to the event location at the same speed, as 
in p2) , a scheme that does not require any coordination. Then 
only the first robot to reach the target changes its reference 
position for the next period. 

Special Cases: If we specialize (ff8]l to the least-squares 



where IIq is the orthogonal projection on the convex set 
Q. Note that the situation where several robots are at equal 
distance from Zk and simultaneously update their reference 
position occurs with probability zero under Assumption [T] To 
justify ( ff8] l based on the discussion in the previous subsection, 
let g(p, z) = min ie[n] f(\\ Pl ~ z\\), i.e., g(p, z) = - update rule 

z\\) for z £ Vi»{p). Then we have 

dg 



coverage control problem with f(x) — x and ignore the 
saturation function, we obtain the update Pi.k+i = Pi,k + 
7fc(Zfc + i —pi : k) for the closest robot. This particular adaptive 
algorithm has been used extensively in various fields, from 
statistics to quantization to neural networks (5), (39) , |40|. 
If f(x) — x and all robots travel at unit speed, the service 
cost for an event appearing at Zk is the time it takes for the 
closest robot to travel to the event location. In this case, the 

z *+i~p^ for 



dp., 



(P» z) 




if zeInt(^(p))\{Pi} 
if z£Vi(p). 



18 1 is simply p hk +i = Pi,k + Ik \\z k+1 - P .. „ 
the closest robot. It is used in the vehicle routing application 
discussed in Section IVl 

Remark 2. For certain distributions and initial robot positions 
outside of the support set of the distribution ¥ z , it is possible 
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that by following < pT8j >, some agents never move. The issue 
also arises with the deterministic algorithm, since the gradient 
^ vanishes if P z (Vi(pk)) = 0. A possible solution to avoid 
this phenomenon is to add an initial transient regime where 



for example all agents follow the first case of the rule (18 1 
rather than only the closest agent. The goal of this transient 
modification is to bring all the robots within the support set of 
the event distribution. It is either stopped at some finite time 
or discounted by a stepsize decreasing much faster that jk, 
thereby not impacting the convergence results |36|. 



We now state a convergence result for the update law (18 1 
to the set of critical points of the objective £ n , i.e., to 



{p€ Q n \D n \V£ n ( P ) =0}. 



(19) 



Even though the algorithm is a stochastic gradient algorithm, 
the discontinuity of V£ n on the set D„ creates technical 
difficulties. To the best of our knowledge, the most thorough 



investigation of the dynamics of (18i can be found in [41] 
and leaves open the question of non-convergence to D„. In 
contrast to that paper, we cope with the non-differentiability 
on D„ by introducing the Filippov set-valued map H n as in 

r {-VE n {x)}, x(£D„, 

v n«5>0 5 ° (U*6B(« I 4)\D n {-V£n(4)}) , X <E D n . 

(20) 

We also need the following definition. A Borel measure /i on 
R q is said to dominate the Lebesgue measure A if X(A) = 
for all Borel sets A such that p,(A) = 0. 



H n (x) 



Theorem 3. Let the stepsizes 7& satisfy ( 12 i, po S Q n \ D„, 



and suppose that Assumption [7] holds. Then, by following 
the algorithm {18), the sequence {pk}k>o of robot positions 



converges almost surely to a compact connected subset of 
C n U (D„ H Q"), invariant for the differential inclusion p G 
H n ( P ). 

If in addition V z dominates the Lebesgue measure on Q, 
then the sequence {pk}k>o converges almost surely to a 
compact connected subset of C„. In particular if £ n has only 
isolated critical points in Q™ \ D n , the sequence {pk}k>o 
converges to one of them almost surely. 

The proof of Theorem [3] can be found in appendix [A] 
The first part of the theorem is a fairly direct application of 
Theorem [T] but does not rule out asymptotic convergence to 
the set D n of aggregated configurations. This motivates the 
second part of the theorem. 

D. Some Extensions 

Before closing this section, we briefly illustrate how the 
stochastic gradient view leads to simple solutions for interest- 
ing variations of the coverage control problem. 

1) A Heterogenous Coverage Problem: As in Subsection 



III-C an event appears randomly in the environment at each 
period and must be serviced. However, let us now assume that 
there are two types of agents, with m A robots of type A and 
rriB robots of type B, and three types of events: a, b, and ab. 
Events of type a must be serviced by a robot of type A, events 



of type B by a robot of type b, and events of type ab by a 
robot of type A and a robot of type B. When a new event 
appears, it is of type a with some unknown probability A Q , 
a E {a, b, ab}, and the agents can observe its type. The spatial 
distribution P Q of events of type a is also a priori unknown, 
and satisfies Assumption [T] Finally, denote the vector of robot 
positions p = [p^ , . . . ,p^ A ,pf , . . . ,p^J. The asymptotic 
configuration of the robots must now optimize the expected 
cost 



f 



,(p) = A a £ 



rnin f A (\\p? - Z\\ 



(21) 



mm J B (\\pf - Z\\) 



mm J A { 



\Pi 



z \\), min / B l 



= b 



Wvf 



^abE 



-z\\) 



max 



= ab 



where f A and fg are increasing, continuously differentiable 
functions with values in R>q. Note that the cost of servicing 
an event of type ab is the maximum of the costs of servicing it 
with one robot of each type. This can model the time necessary 
for one robot of each type to travel to the event location for 
example. 

For this problem, one can verify as before that the stochastic 
gradient update rule ( [T0| takes the following surprisingly 
simple form. When an event of type a appears at location z^+x, 
the closest robot of type A, say i, services it and changes it 
reference position by moving it toward Zk+i by a (truncated 

and projected) step 7kf A (\\zk+i-pi;k\\) i|^I%Cn as in 



18 



and similarly for a target of type b and a robot of type B~. 
the target is of type ab, the closest A and B robots service 
it. To update their reference positions for the next period, 
they first find which of the two is the farthest from the event 
location. Then only this robot moves its reference position by 



the same step as in ( 18 I. In view of the complicated expression 



of the objective function, such a simple rule based update law 
is quite appealing. We illustrate its behavior on Fig. [T] for 
Ia(x) = f B (x) = x. 

2) Target Tracking with Markovian Dynamics: Suppose 
now that we wish to track a single target in discrete time, 
whose position at time tk is Zk, where Zu evolves as a 
Markov chain with a unique stationary asymptotic distribution 
P 2 . The objective is still to optimize £ n defined by Q, which 
represents the steady-state tracking error. We can then use 
algorithm ( |T~8] > to optimize the steady-state robotic network 
configuration, and a convergence result similar to Theorem 
[3] can be proven using stochastic approximation arguments 
| |23| Chapter 1]. This tracking scheme does not require the 
knowledge of the target dynamics nor that of the stationary 
distribution F z . 

As an example, consider a target moving on a circle of 
radius R, with dynamics 

0fc+i=O.95 fc +£fc, 

where the variables are iid uniform on [—0.5,0.5] and 
Zk = [Rcos8k, Rs'm8i c ] T . The result of the adaptive cov- 
erage algorithm for f(x) = x 2 is shown on Fig. [2] Although 
the target distribution does not dominate the Lebesgue measure 
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(a) 



k= 1 



k = 5000 



. - ,?"■** ""' V''"' * . 




50 100 



150 200 250 

target ab number 



Fig. 1 . Heterogeneous coverage control for a system with two types of robots, 
A (green circles) and B (gray squares). Events requiring service from type 
a appear with probability 30% and a distribution approximately centered at 
[20; 20] T (star on Fig. Mia)). Targets of type b appear with probability 30% 
and a distribution approximately centered at [8; 20] T (cross on Fig. Ilia)). 
Finally targets of type ab appear with probability 40% and a distribution 
approximately centered at [20; 8] T (triangle on Fig.^a)). Fig.ffla) shows the 
initial robot configuration and Fig. [TJb) the configuration reached after 1000 
targets, together with the history of target locations. The Voronoi cells of 
each robot are indicated but not computed by the algorithm (separate Voronoi 
diagrams are drawn for the two robot types). Note how robots of type A and 
B tend to pair in the lower right corner in order to service the targets of type 
ab efficiently (here }a{ x ) = Ib( x ) = x )- Fig- Pl c ) shows the empirical 
average cost incurred by the targets of type ab, wnere the average is taken 
over all the past targets of this type seen so far. 



as required in the second part of Theorem [3] in practice we do 
not observe convergence to an aggregated configuration. Note 
how the robots position themselves in the region around the 
point [1, 0] T where the target spends most of its time. 

IV. Adaptive Spatial Load-Balancing and 
Partitioning 

In this section, we design distributed adaptive algorithms 
that partition the space into n cells, one for each robot, so that 
the steady-state probability that an event falls into cell i has a 
prespecified value a,;. Here we have <n > for all i £ [n], and 
2~2?—i a i = !• These algorithms allow an operator to specify 
the steady state utilization of the different robots, by letting 
each robot service only the events occurring in its cell. Such 
spatial load balancing algorithms have important applications 
in multi-robot systems and location optimization, see e.g. [ 19 1, 
[20 1, [42|. An application to the DTRP is described in Section 



m 

As in Section |HI-C[ events occur at times tk and iid 
locations Zk,k > 1, and the unknown distribution V z has 
support included in Q. In this section, Q is assumed to be 
compact for simplicity, but not necessarily convex. Based on 
the observation of the successive event locations, we design a 
sequence of partitions of Q into regions {Ri,k}ie[nh k > 0, 



(a) 



5 20 25 30 




1000 1500 2000 

period number 



Fig. 2. Adaptive coverage algorithm for a target with Markovian dynamics 
moving on a circle. We show on Fig. |5Ja) the positions of the robots (blue 
circles) and the target (red cross) initially and after 5000 time-steps. The 
stepsizes used were 7^ = l/(l + 5x 10~ 3 fc). The curve on Fig. |5Jb) shows 
the evolution of the empirical average cost over time, where the average is 
taken over the past 1000 cost measurements. 



such that at period k > 1, agent i is responsible for servicing 
the event if and only if Zk € Ri,k-i- Here we slightly abuse 
terminology and allow our partitions to have Ri k l~l Rj,k 7^ 
for i 7^ j. Our algorithms produce regions whose intersections 
have P z -measure zero, hence this has no influence on the final 
result. After the k th event occurs, the agents can change the 
boundaries of their respective regions to form the partition 
{Ri,k}i£\n] used to decide which agent services the (k+ \) th 
event. Our sequence of partitions {Ri.k}ie[ n ] converges to a 
partition {Ri}ie[ n ], i- e -, dH(Ri,k,Ri) — > as k — > 00, such 
that F z (Ri) = at for all i £ [n]. 

Let Q = {gi, . . . ,g n } be a set of n fixed and distinct points 
in W, with point gi associated to robot i. We call the point 
gi the generator of region Ri. Designing a partition {Ri}n=.\ n ] 
is equivalent to choosing an assignment of event locations 
to region generators, i.e., a measurable map T : Q — > Q, by 
taking R4 = T 1_1 (.9i), i € [n]. Let us denote the set of all such 
assignments by T. We then look for an assignment T € T 
satisfying the constraint ¥ z (T^ 1 (gi)) = a,;, i € [n], and design 
recursive algorithms producing such an assignment asymptot- 
ically. Now consider the following optimization problem 



inf 

TET 



subject to 



c(z,T(z))V z (dz) (22) 
s (T- 1 (g i ))=a i , ie [n], (23) 



where c:Qx§->Risa given cost function. The following 
theorem gives a general way of producing assignments or 



partitions that optimize (22i, (23 1 



Theorem 4. Consider problem (22\ ( 23 \, where Q is compact, 
and assume that 
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Al) For all i £ [n], z — > c(z, Qi) is lower bounded and lower 
semi-continuous on Q, and z — > maxj e r n i c(z,gi) is V z - 
integrable. 

A2) For all i ^ j £ [n], for all r £ R, the set {z £ Q : 

c(z,gi) — c(z,gj) — r} has V z -measure zero. 
Then the problem admits an assignment T £ T that attains 
the infimum in ( |22[ ). The value of the optimization problem is 
equal to 

« n 

max := / min{c(2, — Wi} P z (dz) + } ajtUj, 

toGR™ jQie[n] 

(24) 

and this maximum is attained for some w* £ R n . An optimal 
assignment T is then given by the generalized Voronoi regions 

VzeQ, T(z) = g l ^z£V l c (Q,w*). 

The function h is concave, and a supergradient of h at w is 
given by 



,XVZ{Q,w)) + a n 



(25) 



Finally, the following supergradient optimization algorithm 



w = 0, 

Wi,k+i = Wi,k +lk{-V z (V l c {Q,w k )) + a,], i= 1, . 



.,N, 
(26) 



where jk is a sequence of stepsizes satisfying ( 12 I, converges 
to an optimal set of weights maximizing h. 

In other words, there is a set of weights w* £ ffi™, 
maximizing the dual function defined in p4|, for which the 



corresponding generalized Voronoi cells \V?(Q, u>*)}j6[ n ] de- 
fined in ([3]) satisfy the constraints of interest ( |23| . In addition, 
the assignment corresponding to these regions minimizes the 
expected cost ( |22| . In practice, we make additional assump- 
tions on the function c to obtain reasonably shaped regions. In 



particular, if c(z, gi) 



-QiW , then the generalized Voronoi 



diagram becomes a power diagram. Because the boundaries of 
the power cells are hyperplanes in R q ]30) , our Assumption 
AG] on P 2 in Theorem [4] is satisfied under Assumption [T] 
Theorem [4] generalizes some results in [19], [20], [42 1 by 
imposing weaker conditions on P 2 and c. A proof is provided 
in Appendix [B] based on results from optimal transportation 

ED-ED. 

For our scenario where P 2 is unknown, we replace the 



gradient ascent algorithm (26i by a stochastic version pre 



sented in Algorithm [T] and whose behavior is illustrated 
on Fig. [3] For simplicity, we specialize the discussion to 
c(z, gi) = f(\\z — gi\\), where / is increasing, and denote 
the generalized Voronoi cells by v/(G,w). If, at period k, 
the event is located at Zk, a possible choice for the stochastic 
supergradient is simply 



L {v/(e,u. fc _ 1 )} 



ax, 



{vd{g,w k -!)} 



(27) 

Computing component i of ( f27] > relies on testing if Z k £ 
V/(Q, ,Wk-i), which is much easier than computing the P z - 



area of the generalized Voronoi cell as in (26i. For this test, 
assuming that at least the robot associated with the region 



Ri^k-i where the k event occurs detects the event, the agents 
can simply run the FloodMin algorithm (see Subsection 
II-D i with di = f(\\Z k - gi\\) - w ijfc _i (and d { = +oo if 



agent i did not detect the event). The following result is now 
a direct application of Theorem [2] 

Theorem 5. Assume that the stepsizes in Algorithm 1 
satisfy ( |72| ), and that Assumptions ^4|7] ^4[2] of Theorem 4 
are satisfied for c(z,gi) = f(\\z — gi\\). Then the weights 
updated by following Algorithm^converge almost surely to a 
maximizer w* of \24^ , and the resulting generalized Voronoi 
diagram {V?(Q, tf*)}ie[n] satisfies the utilization constraints 



Algorithm 1 Adaptive partitioning algorithm 

Require: for robot i: its desired utilization rate a^, and the 

function / such that c(z,gi) — f{\\z~gi\\) in (22 i. 

Initialization: for i £ [n], Wi 0. 

When event k > 1 appears at location Z k : 

• Run the FloodMin algorithm starting with dj = 
f(\\Z k -gj\\) -Wj,j £ [n]. 

• if robot i terminates FloodMin with di = di then 

Wi <r- Wi +7fc-i(a 4 - 1) 
. else 

■7fc-l a i- 



Wi <- Wi 
end if 



V. Adaptive Dynamic Vehicle Routing 



We now combine the algorithms of Section III-C and 
Section |IV] to design an adaptive algorithm for the Dynamic 
Traveling Repairman Problem (DTRP). Assume for simplicity 
in this section that the environment is planar, i.e., q = 2. In the 
DTRP fl3) , events appear in the workspace Q according to a 
space-time Poisson process with rate A and spatial distribution 
P z . We assume as in Section |III-C| that Q is convex and 
compact. When the k th event appears at time t k , a robot 
needs to travel to its location Z k to service it. The robots 
travel at velocity v according to the kinematic model ([T]). The 
time that the k th event spends waiting for a robot to arrive at 
its location is denoted W k . The robot then spends a random 
service time S k at the event location, where the variables S k 
are iid with finite first and second moments s, s 2 . The system 
time of event k is defined as = W k + S k , k > 1. The goal 
is to design policies for the robots that minimize the steady- 
state system time of the events S = limsup^^ Let 
p = Xs/n denote the load factor, i.e., the average fraction 
of time a robot spends in on-site service fl5) . Policies for the 
DTRP are usually analyzed in two limiting regimes, namely in 
light traffic conditions (A — > + ) and heavy traffic conditions 
G>->1-). 



The policies for the DTRP initially proposed in |13|-|15| 



require the knowledge of the event distribution P 2 . The recent 
references fl8) , (22) propose algorithms for the DTRP that 
work without this knowledge in the light traffic regime, but 
leave open the adaptive problem in heavy traffic. The following 
sections make two contributions to the DTRP. First, in the light 
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Fig. 3. Partition for 10 robots after 1000 events for the quadratic cost 
c(z,gi) = \\z — gi\\ 2 . The partition at each step is a power diagram. 
The desired utilization rates are shown for each agent on the figure. The 
power diagram generators used are represented as black dots in the lower left 
corner. Note that fixing their positions determines the orientation of the cell 
boundaries (31| . The power cells shown in red are computed using CGAL 
1 43 1, but need not be computed by the agents running Algorithm[T] The bottom 
left figure shows the evolution of the empirical utilization frequencies over 
the first 1000 events, and the bottom right figure the evolution of the weight 
vector Wk- The chosen stepsizes were 7^ = 10/(1 + O.Olfc). 



traffic case, we use the adaptive coverage control algorithm of 



Section LU-C to obtain an adaptive policy that is simpler than 
the solutions proposed so far (18) , (22 1 and provides the same 
convergence guarantees. Second, for the heavy traffic case, 
we present the first fully adaptive policy for the DTRP that 
provably stabilizes the system as long as it is stabilizable, in 
the absence of knowledge of P z . This policy relies on the 



adaptive partitioning algorithm of Section IV 



A. Light Traffic Regime 

Note first that we always have (14) 

£ > min £ n (p) + s, 
p 



(28) 



where £ n (p) is defined by Q for f(x) — x/v. This bound is 
tight in light traffic conditions fl3) , fl5) , and achieved by the 
following policy. Let p* — [p*, . . . ,p*J 6 Q" denote a global 
minimizer of £ n , called a multi-median configuration. In the 
absence of events, vehicle i waits at the reference position p* . 
When an event occurs, the agent whose reference position is 
closest to the event location services it. It then travels back to 
its reference position p*. As A->0 + , the agents are at their 
reference configuration p* when a new event occurs, and this 
policy achieves the bound (|28]i |15|. 



To obtain an adaptive version of the above policy, we can 



use the coverage control algorithm of Section III-C to find 



a local minimizer of £ n . In the absence of an event, each 
robot waits at its current reference position pi : k- When the k th 
event occurs at Zj., the robot whose current reference position 
is closest to Zfe, say robot j, services the event, updates its 

1 Z k — Pj.fc-i 



reference position to pj ^ = ITq 

and travels back toward pj^- Reasoning as in 1 13 1, (15), in the 



Pj,k-i + Jkjj 



pj,k-i\ 



light traffic case where A — > 0, the agents are at their reference 
positions when a new event occurs. Hence the resulting policy 
achieves a steady-state system time of £ n (p) + s, where p 
is a critical point of £ n to which the stochastic gradient 



algorithm (18 1 converges under the assumptions of Theorem 



[3] For n = 1, it achieves the minimum system time since £\ is 
convex. A similar guarantee is provided by the adaptive light 
traffic policy described in (22) , at the expense of a significantly 
more complex algorithm where the robots keep track of all 
past locations visited. Note that these policies turn out to be 
unstable as the load factor p increases (15) , which motivates 
the heavy traffic policy of the next section. 



B. A Stabilizing Adaptive Policy 

Policies adequate for the heavy-traffic regime but requiring 
V z to be known are described in, e.g., |[T5], (18), (44) , 



1 45]. The following non-adaptive policy, although not the 
best available, stabilizes the system in heavy-traffic, i.e., as 
p -> 1~ (18), (44), (45). We partition the workspace Q into 
n regions {i?i}ig[ ra i such that P z (i?j) = l/n,i € [n]. Robot 
i only services the events occurring in region i?^ It does so 
by forming successive traveling salesman tours (TSP tours) 
through the event locations falling in this region, and servicing 
the events in the order of the tours. Recall that a TSP tour 
through a set of points is the shortest (here, for the Euclidean 
distance) closed tour visiting each point in the set once. While 
robot i services the events in a given tour, new events can 
occur in region Ri and are backlogged by the robot. Once 
a tour is finished, the robot forms a new tour through the 
backlogged events and starts servicing them. Assuming that 
¥ z has a density <f> z , it is known that this policy achieves 
the following bounds on the system time in heavy-traffic [18, 
theorems 4.2, 6.4] 



C* 



< lim (1 - p) 2 E < 



2C* 



P-s-i- 



(29) 



where C* 



C 



\(f Q <f> z (z) 1/2 dzj 



and C « 0.253. 



In addition, the right-hand side of ( |29| > can be changed to 
2C*/n 2 if V z is the uniform distribution on Q (l8j. Now 
consider the adaptive version of this policy described in 
Algorithm [2j which partitions the workspace as in Section 
|IV| and works without the knowledge of any event process 
parameter such as A or ¥ z . 

Theorem 6. The adaptive policy of Algorithm [2] stabilizes 
the system as long as p < 1 and achieves a steady-state 
system time satisfying the heavy traffic performance bound 
(29). Moreover if n — 1, this adaptive policy is also optimal 
in the light traffic regime A — > + . 
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Proof: As p — > 1, the region of each robot is never 
empty and hence the robot never enters the mode in Algorithm 
[2] where it goes toward its reference position p. t jl8|. By 
Theorem [5] the space partition allocating events to robots 
converges as k — > 00 to a power diagram {Ri}i^\ n ] sucn 
that ¥ z (Ri) — 1/n. Hence the adaptive policy behaves in 
steady-state as the non-adaptive policy and satisfies ( |29] l. In 
the light traffic regime (A — > 0) and in steady-state, each agent 
following Algorithm [2] is at the median of its region Ri when a 
new event occurs. Indeed the updates of the reference position 
Pi can be viewed as a stochastic gradient descent algorithm 
for the cost gt(pi) — E[\\pi — Z\\\Z £ Ri] (notice here 
that the space partition evolves independently of the reference 



locations pi, i £ [n], in constrast to Section III-C 1. In particular 
if n = 1, there is just one cell and E\ is strictly convex, so 



the policy achieves the performance bound (28 1. 



Algorithm 2 Adaptive DTRP algorithm. Robot i updates a 
weight Wi £ 



as in Section IV a reference position pi £ K 9 , 
and two sets of event locations Oi and Vi- It is also associated 
to a fixed point gi £ Q, with 7^ gj, for i 7^ j. 

Initialization: for i £ [n], Wi <— 0, pi <— robot i's initial 
position, Oi <- 0, V l <- 0. 
When event k > 1 appears at location Z^: 
• Run the FloodMin algorithm starting with dj = 



9j\ 



w 



.) ■ 



J G [n]. 



if robot i terminates FloodMin with d± = di then 

Wi <- w, + 7fe-i(n - l)/n, Pi 

n Q r 

else 

w. 



, Oi <- d U {Z fc }. 
7fc_i/n, and Pi,Oi remain unchanged. 



. end if 

In parallel, execute the following process forever for each 
robot i £ [n] 

1) When Vi = Oi — 0, robot i travels toward pi and stays 
there if pi is reached. 

2) When Vi = and O t ^ 0, then let V l <- 0;, O; <- 0. 
Compute an Euclidean TSP tour through the points V%. 

3) When Vi ^ 0, then service the locations in Vi in the 
order of the TSP tour, removing them from Vi when 
they are serviced. At the end of the tour, we are back 
in the situation V% = 0. If Oi = 0, go to 1), otherwise, 
go to 2). 



VI. Conclusions 

We have discussed robot deployment algorithms for cover- 
age control, spatial partitioning and dynamic vehicle routing 
problems in the situation where the event location distribution 
is a priori unknown. By adopting the unifying point of view of 
stochastic gradient algorithms we can derive simple algorithms 
in each case that locally optimize the objective function 
(globally in the case of the partitioning problem). The coverage 
control and space partitioning algorithms are combined to pro- 
vide a fully adaptive solution to the DTRP, with performance 
guarantees in heavy and light traffic conditions. 



Among the issues associated with stochastic gradient al- 
gorithms, we point out that they can be slower than their 
deterministic counterparts and that their practical performance 
is sensitive to the tuning of the stepsizes 7^. Many guidelines 
are available in the literature on stochastic approximation 
algorithms for the selection of good stepsizes and possibly 
iterate averaging, see e.g. p5| , (37j. In addition, if some 
prior knowledge about the event distribution is available, it 
can be leveraged in a straightforward hybrid solution that first 
deploys the robots using a deterministic gradient algorithm. 
Once the robots have converged, the adaptive algorithm is 
used to correct for the modeling errors and environmental 
uncertainty, exploiting actual observations. Note also that the 
stochastic gradient algorithms can still be used if the distribu- 
tion P z is known, by generating random targets artificially 
and essentially evaluating integrals such as <|3j by Monte- 
Carlo simulations pT) . However, this method is generally only 
advantageous for dimensions q sufficiently large. 

Appendix A 

Convergence of the Coverage Control Algorithm 

In this appendix we collect a number of useful properties 
of the gradient system 



p = -V£ n (p), P (Q)£Q n \D r 



(30) 



where the distortion function £ n is defined in and QcM' 
is convex and compact. As discussed below, this ODE is well 
defined on Q™ \ D„. We also consider its extension to Q™ in 
the form of the differential inclusion 



p£H n ( P ), p(0)eCT 



(31) 



where the the set-valued map % n is defined in pO) . Note that 
V£„ is piecewise continuous and H„ can in fact equivalently 
be defined as |29] p.51] 

'{-VE n (p)} if pi D n , 
r rln(p) = { co{lim fc ^. 00 (-V£ n (pfc))|pA; -> P as k ->• 00} 

if P G D„. 

(32) 

Following the ODE method |34|, we can characterize the 



asymptotic behavior of the algorithm ( 1 8 1 as in Theorems 
[Tj and [3] by studying the properties of these continuous- 
time dynamical systems. We assume as in section III-C that 
/ : K>o — > K>o is increasing and continuously differentiable. 
We refer the reader to [8], p0| , pT) , J46[ for previous work 
on the gradient system (30i. In particular, [41 1 provides some 
convergence results for algorithm ( 18 1, and points out that 



the non-differentiability of £„ creates technical difficulties in 
the convergence proofs. We handle these issues by initially 
considering the differential inclusion ( f3T| instead of the ODE 
(f30T). 



A. Differentiability Properties of £ n 

The first task is to prove the Lipschitz continuity and differ- 
entiability properties of the function £ n stated in Proposition 
[Tj We follow the argument of j 41 Proposition 9]. Let us begin 
with some preliminary lemmas. 
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Lemma 1. For every z G Q, the function p — > e(p, z) := 
min ig [„]{/(||z — is uniformly Lipschitz continuous, with 



\e(p,z) -e(p',z)\ < max /'(a;) max||pi-p-||, 

\:cG[0 : dmm(Q)] / i£[n] 

Vp,p' e Q", Vz e Q. 

Proof: Let p.p' G Q". Denote S = {pi,. . . ,p n } C Q, 
and similarly for S' . We have e(p, z) = f(d(z,S)). Then 

\f(d(z,S))-f(d(z,S'))\ 

< max f'(x) I | min \\pi — z\\ — min ||p' — z\\\. 



:ce[0,diam(Q)] 



Fix z G Q, and denote min ie r„i \\pi — z\\ 



z\\ and 



ye[n] \\Pj ~ A 



z||. Then 



min \\ Pl - z\\ - min |jp^- - z|| < ||p io - z\\ - ||p^ - z\\ 

l£[n] j£[n] 

< \\Pj -Pj \\ < max |bi -p-ll, 



and 



mm p- 

je[«] J 



min ||pi - z|| < \\p'. - z| 



ie[n] 



llPio - Z \\ 



< ||P*o -Pioll ^ max HP* "Pill 



This proves the lemma. 



Lemma 2. Lef e : E m x Q 4 M a function and C K m 
foe an open set such that, for all p G O, for all i G [n], the 
partial derivative z — > de/dpiij), z) exists ¥ z -almost surely. 
Moreover, assume that there is a constant C and a norm || • | 
on R m such that p — > e(p, z) is uniformly Lipschitz, i.e., 

\e(p,z)-e(p,z)\<C\\p-p'\\, Vp,p'GO, Vz G Q. 

Then the function p — ¥ Jq e(p, z)¥ z (dz) is differentiable on O 
and we have, for any p G O, and for each i G [n], 

A / e(p,z)¥ z (dz) = [ ^- e {p,z)V z {dz). 
OPi Jq Jq Opt 

Proof: Denote F(p) — J Q e(p, z)¥ z (dz), and e(p, h, z) — 

[e(p+h ei ,z)-e(p,z)] rpjjen 



lim gVLM^Xg) = Um / e(p,h,z)F z (dz). 

Now for all ft sufficiently small and all z G Q, we have 
|e(p, h, z)\ < C by the Lipschitz continuity assumption. Hence 
by the dominated convergence theorem, 



— / e(p,z)P z (dz) = hm 

dpi Jq h->0 



ft 



lim / e(p, ft., z)P z (<iz) 



lim e(p, ft, z)P z (<iz) 
^-e(p, z)P*(dz). 

OPi 



We now prove Proposition [T] 




Fig. 4. Vector field for the gradient system (30) , with two agents evolving 
on [0, 1] and P z uniform on [0, 1]. The discontinuity on the line x\ = X2 
occurs when the two agents switch side, from x\ < X2 to x\ > X2- Note that 
the vector field is symmetric with respect to this line. The equilibrium occurs 
at a unique geometric point on the line, namely (1/4,3/4), corresponding 
to two stationary points for the flow, one for each ordering of the robots. 
The differential inclusion \M) has an additional (unstable) equilibrium at 
(1/2,1/2). 



Proof of Proposition |7J The inequality 



\£ n {p) ~ £n{p')\ < max f'(x) maxHpi-p'J, 

\zG[0,diam(Q)] J ie[n] 

is immediate from Lemma [T] so £ n is globally Lipschitz 
continuous on Q n . Assume now that p G Q" \ D„. Note that 

for z ^ VJ™ =l dVi(p) and z ^ pi,i G [ri], we have 



Since the set where this formula does not hold has P z -mesure 
zero under our assumption (recall that dVi(p) consists of 
subsets of hyperplanes), £ n is differentiable at p and |5]l is 
a direct consequence of Lemma [2] Moreover, £ „ is in fact 
continuously differentiable at p if and only if the partial 
derivatives d£ n (p) / dpi are continuous at p, for all i G [n]. 
Since we assume that /' is continuous on Q, the function 
P -> I{p,z) ■= f'(\\Pi - z||) ||p*I*|| lvt( P )(z) is continuous on 
Q" \ D n for P z -almost all z G Q (i.e., for z <£ \J^ =1 dV l {p) and 
z 7^ pi,i G [n]), and moreover z I{p, z) is bounded on the 
compact set Q. Hence the continuity of the partial derivatives 
(|5]> is a consequence of the Lebesgue dominated convergence 
theorem. ■ 
From Proposition [T] the function £ n is continuously differ- 
entiable on M"\D„. In general however, V£„ is discontinuous 
on the set D„, see Fig. [4] To discuss more precisely the 
behavior of the gradient of £ n as we approach the set D„, 
define 



N{x) 



\V£ n (x) 



if x G 



liminfygR^D^,^ \\\7£ n (y)\\ 2 if x G D„. 



Note that because V£„ is continuous on W 1 \ D„ the two 
definitions of TV in fact coincide on this set. 

Our goal is now the result of Proposition [2] below, whose 
proof follows that of pT| Lemma 30]. We introduce first the 
notion of Voronoi aggregates, which occur when several robot 
positions coincide. If x = [x\, . . . , x n ] G D n , there is at least 
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one set J C [n],|J| > 2, of components such that for all 
i,j £ J, Xi = Xj =: x,j and for all i £ J, j ^ J,Xi ^ Xj. 
The set J is called an aggregate of components of x, and 
x £ D„ can have several aggregates. The aggregate J is then 
associated to the Voronoi cell Vj(x) := {z £ M. q \ \\xj — z\\ < 
\\x k - z\\,Vk £ [n]}. Let p^ e Q" \ D„ be a sequence 
converging to p £ Q™. Consider the vectors 



,(fc) _ 



(fc) _ „(fe) 
(fc) _ „(*) 



« 7^ J- 



Since the unit sphere is compact, we can extract a subsequence 

(k) 

so that the unit vectors u^' converge, even if i,j belong to 
an aggregate. We will need the following technical result. 

Lemma 3. Let p( k ' be a converging sequence as above such 

(k) 

that the vectors converge, for all i ^ j. Denote itu = 



limi 



.(* 



. Moreover, define for that sequence the sets 



At := {z £ Q\3k z s.t. z £ Vi(p ik) ) for all k > k z }, i £ [n]. 

(33) 

Let zq £ Q. Then if hyperplanes in R q have V z -measure zero, 
we have 

ly.( p (fc)) (zq) — > lA i (zo) as k — > oo,P 2 - almost everywhere. 

(34) 



Proof: If zq £ Ai, then ( |34[ > is clear. Otherwise, there 
exists (at least) two indices i ^ j such that 

z £ A i3 := {z £ Q\z £ V t (p {k) ) and z £ V 3 {p {k ' ] ) 

for infinitely many k, k'}. (35) 

Hence there are subsequences h(k) and lz{k) such that 



\\P 

and \\p 



h(k) 
i 

h(k) 



zo\\<\\p^ k) -zo\ 
zo\\<\\p* {k) ~zo\ 



Vfc > 0, 

Vfc > 0. (36) 



Let us first assume that the indices i and j do not belong to an 
aggregate. Letting k — > oo in each subsequence, we obtain that 
\\pi — zqII = \\Pj ^ z q\\, hence zo belongs to the hyperplane 
perpendicularly bisecting the line segment (pi,pj). But this 
hyperplane has P z -measure zero by assumption. Now assume 
that i,j belong to an aggregate J and consider the vectors 



,(fe) 



P 



(k) (fc) 
■Pi 



(fc) Jk) 



IIP 



Pi 



Then from ( 36 1 we get respectively 



(Zi(fc)) (h(k))\ ^ f, W1 ^ n 

U) A , Zq — Pj ) < 0, VK > 0, 



Proposition 2. Suppose that Assumption [7] holds and that 
P z dominates the Lebesgue measure on Q n . Then we have 
N(p) > for all p £ D„ fl Q™. Hence there exists S > 

SMc/l f/lflf 

inf ||V£„(p)|| 2 =: k > 0. 

P eQ"n(S(D„nQ",<5 )\D„) 

Moreover, there exists B < oo such that \\y\\ 2 < B for all 
y £ l-ln{p) and all p £ Q™. 

Proof: Denote D„ := D„ n Q™. Let p £ D„, and J be 
an aggregate for p. By the characterization of lower limits, 
consider a sequence p^ £ Q" \ D„ converging to p and 
such that ||V£„(p fc )|| 2 — > N(p) as k — > oo. Up to taking a 
subsequence, we can assume that the sequences of unit vectors 

u\j := jjjrj — j-jjy-jy converge for i, j in the aggregate, so that 



,(fc) 



> v,ij,Vi ^ j. 



We are now in the situation of Lemma [3] Next, consider the 
quantity 



I k = 



i{^(p(*))}(«)/'(lbi k) - z||) ( w 



,(fe) 



jfc) 



£0 



for i, j £ J. It is easy to see that for z £ Vj(p k ), the inner 
product inside the integral is nonnegative. By the dominated 



convergence theorem, and using ( 34 1, we have 



f(\\pj-z\\)(u 



PJ 



PJ\ 



>0 



On the other hand, we also have by |5]) 

(k) d£ n 



,(dz). 



(37) 



'Q 
and so, 

lim 



-(v (k) ) 

1 {V ] (pW)}( z )f(\\P, 



(fc) 



*ll) 



,(*) 



.(fc) 



Pz(rfz), 



,(fc) 



d£ n 
dp 3 



(P (k >) 



'■>.) > 



lim — 



d£n 

dp 



again by the dominated convergence theorem. Hence if 
limfc_j. 00 — ^ L {p k ) = 0, then Zy = 0. But since the inner 



and (ul? (k) \z -pf 2m ) > 0,Vfc > 0. 



(uij,z - pj) = 0, 

and so zq must belong to the hyperplane perpendicular to Uij 
passing through pj. Again, this set has measure zero. Overall 
( |34) > hold everywhere except perhaps on a finite number of 
hyperplanes, hence it holds P z -almost everywhere. ■ 



product inside the integral in (37i is zero only on the hy- 
perplane perpendicular to u^j passing through pj, it must be 
strictly positive outside of this hyperplane. Also, we assumed 
Letting again k -> oo in the subsequences, and since we that /' is strictly positive. This implies that P Z (A/) = 0. This 
assumed that {u\f} converges, we obtain holds for all j G J if N(p) = 0. 

But now take z £ Vj, the aggregate Voronoi cell. Then 
z £ Aj for some j £ J, or z £ Aij as defined in ( 35 1, 
for some i ^ j. The set of points satisfying the second 
condition has measure as shown in Lemma[3] In other words, 
Vj = U jeJ Aj U S, with P z (5) = 0. We obtain therefore 
P z (Vj) = 12jej^z(Aj) = 0. But this is impossible since 



UNIVERSITY OF PENNSYLVANIA, ESE TECHNICAL REPORT 



13 



Vj is a polygon with non-empty interior, which has positive 
Lebesgue measure. Therefore N(p) > 0. 

Next, recall the following definition of the lower limit |47 
P-8] 



lim inf f(x) 



sup 



inf fix) 



where Mix) denotes the set of neighborhoods of x. Now if 
N(p) > for all p £ D„, then by the characterization of 
the supremum, we have that for all p 6 D„, there exists a 
neighborhood V(p) £ W(p) such that 

N(p) 



< 



< inf 

pev(p)\{p} 



\V£ n (pW <N(p). 



Without loss of generality, we can assume that V(p) is open. 
Then Upe6 ^(p) f° rms an open cover of the compact set 
D n , hence we can extract a finite subcover V(p 1 ) : . . . , V(p ). 
Clearly this finite subcover is again a neighorhood of D„, 
hence it contains _B(D„,(5 ) for some 8 > 0. Since TV is a 
lower semi-continuous function, it attains it minimum N* on 
D n . We then have k > N* /2 > in the proposition. 

Finally, the fact the N(p) < oo follows from |5]). This 
immediately gives the last part of the proposition by definition 
of the set-valued map T~L n . ■ 

B. Trajectories of the Gradient System 

We now turn to the study of the trajectories of the ODE 
( |30l > and the differential inclusion ( |3"Tj >. 

Proposition 3. Suppose that Assumption [7] holds and that P z 
dominates the Lebesgue measure on Q n . If Xq £ Q" \ D„, a 
trajectory t — » x(t) of the ODE (30) with x(0) = Xq remains 
in Q n \ D„, i.e., for all t < oo, x(t) £ Q" \ D„. Moreover, it 
converges to a compact connected subset of {x £ Q n \ D n : 
V£„ (x) = 0}. Finally, a trajectory of the differential inclusion 
\31\ starting from Xq £ Q n remains in Q n . 



Proof: First, consider the situation where a robot i is on 
the boundary of the workspace, i.e., pi £ dQ. Assume first 
that the agents are separated, i.e., p £ Q™ \ D„. For a point 
x £ dQ, let us denote the normal cone at x 

C%(x) = {v£ R q \ (v,x-x) <0,Vx£ Q}. 

Now let v £ C%(pi). Then 

z-p, 



d£ n , \ 
dp t 



Vi(p) \ \\ z -PiW/ 

< 0. (38) 



Hence the vector —d£ n /dpi belongs to the polar of the normal 
cone, i.e., to the tangent cone of Q J48| Cor. 5.2.5], and so the 
trajectories of the agents do not leave the workspace Q. The 



inequality (38 i is preserved by taking convex combinations, 



and holds at points p £ D„ for the differential inclusion (31 
as well. 

Next we show that a trajectory starting outside D„ never 
meet D„. Assume that for some t* £ (0, oo), we have p(t*) = 
p* £ D n . Let J be an aggregate of p* , so that 



Pi =Pi 



: p*jyi,j £ J- 



Pj(t) -Pi{t) 
\\Pj(t) ~ PiitW 



Define, for all i ^ j in J 

<Pij(t) = \\Pj(t) ~ Pi(t)\l : 
and note that 

<j>ij(t) = (Uij(t),pj - pi) 

= («ii(*).-^(P(*)) 
Next, consider the following quantity, for i ^ j £ J, 

/«(*) 

lv5( P (t))(^)/'(lbi ~ z\\) ( u ij{t), ^ ^^ V z {dz) 



d£ n 
dpt 



(P(t)) 



>0 

and note that because we integrate over the Voronoi cell of 
j, the inner product and thus Iij(t) are nonnegative, for all 
t. We then argue as in the proof of Proposition [2] Consider 
a sequence of times {tk}k with tk — > t* as k — > oo. Up to 
taking a subsequence, we can assume Uij(tk) converges to 
some unit vector By the dominated convergence theorem, 
defining Aj as in (33 1, we have 



lim l}At k ) 



lA 3 (z)f(\\pj-z\\)lu, 



PJ 



\ z -pj\ 



W z (dz) 



lim 



Ui 3 it k ),-—" L (p{t k )) 
dp. 



where the last equality follows by another application of the 
dominated convergence theorem. Similarly, defining 



4-w 



= J q W t wt))(z)f'Q\Pj - 4) {unit), || _ z || 



V z {dz), 



we have 



lim Ifjtk) 



l Az (z)f(\\pj-z\\)(u 



PJ 



\\PJ 



V z (dz) 



lim 



d£ n 

Uij(tk), -g^iP^k)) 



As in the proof of Proposition [2j because we assumed that 
F z dominates the Lebesgue measure, one of the sets A t must 
have P z (Ai) > since P z {Vj) > 0. This gives, with our 
assumption that hyperplanes have P z -measure zero, 

lim <fiij(t k ) > Vz,j, and lim 4>ij{t k ) > 

k— >oo k— >oc 

for at least one pair i, j. 

Thus there exists i,j £ J such that liminf t _j.t* <fiij(t) > 0. 
Therefore (j)^ (t) is positive for t < t* close enough to t* . But 
this contradicts the fact that <pij(t) = \\pi{t) — Pj(t)\\ — > as 
t t*. 
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Finally, the convergence of the trajectories of the ODE to 
a compact connected invariant subset of {x G Q" \ D„ : 
W£ n (x) — 0} follows from Lasalle's invariance principle. ■ 

We can now show that the trajectories of the ODE never 
stay in B(D n n Q n , So) for a long time. 

Corollary 1. Suppose that Assumption [7] holds and that P 2 
dominates the Lebesgue measure on Q". Let So > 0, K > be 
defined as in Proposition^ x G Q" n (B(D n n Q™, S ) \ D„), 



and let T 



;Q"nB(D„nq",^) 



Then a trajectory of 



the ODE passing through xo at time t\ must exit B(D n fl 
Q™, S ) at some time t 2 < t\ + T. 

Proof: We have, for t > t-y and as long as the trajectory 
t -» x(t) remains in B(D n n Q™, So) \ D„ 

< £ n (x(t)) = £ n (x ) - f \\V£ n (x{s))\\ 2 ds 

Jt! 

< max £ n (x) — K,(t — t±) . 

xeQ"n(B(D„nQ",5 )\D„) 

Hence the trajectory must exit £?(D„nQ™, <5o)\D„ at or before 
the time given in the theorem. But we know by Proposition 
[3] that it cannot hit D„ at t% < 00. Hence it must in fact exit 
£(D„nQVo). ■ 
The set C„ defined in (19 1 contains the set of limit points 



of the ODE (30 1 by Proposition [3] From the definition of the 
set-valued map H n , the set C of limit points of the differential 
inclusion (3T) consists of the set of limit points of the ODE 
( |30"| > together with the limit points of the sliding trajectories 
that start and remain on D„ (since a trajectory leaving D„ 
does not converge to D n by Proposition 01. Hence C C C„ U 
(D n n Q"). Moreover, we know by Proposition [2] that C„ C 
Q" \ B(D n , So) if P 2 dominates the Lebesgue measure. 

C. Convergence of the Adaptive Coverage Control Algorithm 

We now prove the main convergence theorem for the 
adaptive coverage control algorithm. 

Proof of Theorem p5j First, for the proof of convergence, 
we can ignore the projection Hq in (18 1. In general, the 
analysis involves the corresponding projected ODE or pro- 
jected differential inclusion, see (37), (36| chapter 5]. Note 
however from Proposition [3] that at any boundary point of Q™, 
the velocity vector of the unprojected differential inclusion 
is already in the tangent cone of Q". Hence the projection 
step does not change the continuous-time dynamics and the 
convergence properties remain the same as for the unprojected 
algorithm. Moreover, the saturation function does not change 
the convergence properties either |37, Section 1.3.5]. 

Now the fact that with probability one, a sequence of iterates 
of ( p"8j ) converges to a compact connected invariant set of the 
differential inclusion (31 1 is standard, see, e.g., (36| chapter 
5], (37] Theorem 8.1 p. 195]. Consider a sample cj such 
that {p k (u)} converges to such a set, denoted S. In view 
of Proposition 0J we have S C Q™. Suppose that S is not 
entirely contained in D„, and take a G S \ D„. Then a 
trajectory of the differential inclusion passing through a at 
t = is in fact a trajectory of the ODE ((30| for t > 0, 
by Proposition [3] Because S is invariant, we must then have 



£ n (a) := — 1| V£„(a)|| 2 = 0, i.e., a € C„. This proves the first 
part of the theorem. 

If P 2 dominates the Lebesgue measure, then we know 
that C„ and D„ are disconnected by Proposition |2j so S is 
contained in one of these sets. Recall that under Assumption 
[T] w e can assume that almost surely, the iterates {pk}k>o of 
(18 1 never hit D„. Choose the sample w above in this set 



of probability 1, and recall the definitions of <5o and T from 
Corollary [T] Suppose now that S C D„. Then there exists fco 
such that for all fc > fco. Pk G B(D n , Sp/4). For any k > 0, 
denote by x k (-) the solution of the ODE ( 30 1 starting at pk 
(i.e., x k (0) — pk). Also, denote by p the piecewise linear 
interpolation of the sequence pk with stepsizes 7^. 

Then by [36} Chapter 2, Lemma 1], there exists fci > fco 
such that for all fc > fci, we have sup te j tfc tk +T\ \\P{t) ~ 
x k (t)\\ < S /4, where t k := 7z- m particular, \\p(t k + 

T) - x k (t k + T)\\ < So/ '4. Now remark that by Corollary 
[I] we must have d(x k (t k + T),D n ) > So- By possibly 
increasing fci, we can assume that there is an iterate pz with 
fc > fc such that \\p-j, - p(t k + T)\\ < 5 /4. So we have 
\\p k -x k {t k +T)\\ < So/2, hence d(p~ k , D n ) > S /2. But this 
contradicts our assumptions that pj, G B(D„ , So/ 4). Hence we 
cannot have S C D„ and so S C C„. This finishes the proof 
of the theorem. 



Appendix B 

Space Partitioning and Optimal Transportation 

In this section we prove Theorem |4j which forms the 
basis for the stochastic gradient Algorithm [T[ partitioning 
the workspace between the agents. Compared to the results 
presented in the recent papers jS), p9) , this theorem makes 
weaker assumptions on the cost function c(z,g) and on the 
target distribution P 2 . The main tool on which Theorem [4] 
relies is Kantorovich duality (26). See also (27), (49), (50 1 for 
related results. 

Proof of Theorem [5f We start by relaxing the optimiza- 



tion problem (22 1, (23 1 to the following Monge-Kantorovich 
optimal transportation Problem (MKP) |26|. Let P2 = 

J2™ =1 a>iS gi , so that (23 1 can be rewritten P z o T^ 1 = ¥2- 



We consider the minimization problem 

min / c(z,q)dir(z,q), 

where A4(V Z ,P2) is the set of measures on Q x Q with 
marginals P 2 and P2, i.e., 

n(A xQ)= P 2 (A), tt(Q x B) = P 2 (B), 

for all Borel subsets of A, B of Q. In other words, we 
are considering the problem of transferring some mass from 
locations distributed according to P 2 to locations distributed 
according to P2, and there is a cost c(z,g) for moving a unit 
of mass from z to g. Then ir is a transportation plan from the 
initial to the final locations, assuming that we allow a unit of 
mass to be split. The case where this splitting is not allowed, 
i.e., where we restrict ir to be of the form 

dx(z,g) = dV z (z)S T(z) (g), 
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for some measurable function T, is a Monge Problem (MP) 
[51 1, and is exactly our problem (22 1, (23 i. In our case where 
the target distribution P 2 is discrete, [52, Theorem 3] shows 
that solving the MKP gives a solution in the form of a 
transference function T, i.e., a solution to the MP, under the 
assumption A|2]of the theorem, and assuming the infimum in 
( f22] i is attained. This is the case if c is lower bounded and 
lower semicontinuous and P 2 is tight [26, Remark 2.1.2], and 
this last condition is satisfied since we assume Q compact. 



Next, by Kantorovitch duality |26|, we have 



nun / c(z,q)dir(z,q) 
*eM(F z ,p 2 )J QxQ 



(39) 




sup < / 0[Z) 
>,tu)G* c [Jq 



(z) + ^ a iWi 



where $ c is the set of pairs (tp, w) with 
L 1 (Q,P 2 ), w e W\ such that 

tfi(z) + Wi < c(z,gi), 



Q 



in 



(40) 



for P z -almost all z in Q and for all i in [n]. Now for any 
w £ R, define the function w c : Q — > K such that 

w c (z) = min{c(z, g^) — u>i}. 

iS [n] 

From the definition of <P r , we can then without loss of 



generality restrict the supremum on the right-hand side of ( 39 1 
to pairs of the form (w c , w). Combining this with the previous 
remark on the Monge solution to the Monge-Kantorovitch 
problem, we get 



mm / c(z,T(z))¥ z (dz) 



(41) 



= sup < / mm{c(z,gi) - Wi}P z (dz) + y2a i w l > . 
wen." [jq^eN i=1 J 

Hence the value of the optimization problem is equal to the 



supremum of the function h defined in ( 24 1. The fact that the 
supremum is attained in the right hand side of ( |4T| ) follows 
from, e.g., (26] Theorem 2.3.12] under our assumption A[T]for 
c. 

The function h is concave since w — > min iS [„]{c(z, gi) — 
Wi] is concave for all z as the minimum of affine functions, 
and the integration with respect to z preserves concavity. 
Finally, for w 1 ,w 2 € R n , we have 



h(w 2 )-h(w 1 )= / mm{c(z,g l ) - w 2 } ¥ z (dz) 
Jq »£[«] 



- / mm{c(z,gi) - wl} F z (dz) + di(w 2 - wl). 
J Q ie[n} ^ 

Denoting T 1 an assignment that is optimal for w 1 (given by 
a generalized Voronoi partition), we have then, for all z € Q, 



min{c(z,g 4 ) - w 2 } < c(z,T 1 (z)) 

ie[n] 



and so 

n 

h(w 2 ) - hiw 1 ) < -^Tp 2 (^(sV))k 2 - «*) 

n 

4=1 

But this inequality exactly says that [ai — 
V z {V{{g,w 1 )) 1 ...,a n ~F z (VZ{G 1 w 1 ))] T is a supergradient 
of h at w 1 . For the convergence of the supergradient 
algorithm, see |38 Proposition 8.2.6. p. 480]. ■ 
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